Workshop: Tweet"Intro to Apache Spark"
The Introduction to Apache Spark tutorial is for developers who already work in Python, Java, Scala to learn to use core Spark APIs. This workshop features hands-on technical exercises to get up to speed using Spark for data exploration, analysis, and building Big Data applications.
Topics covered include:
- creating a notebook or installing Spark locally
- pre-flight check: initial Spark coding exercise
- Spark Deconstructed: RDDs, lazy-eval, and what happens on a cluster
- A Brief History: motivations for Spark and its context in Big Data
- progressive coding exercises: WC, Join, Workflow
- Spark Essentials: context, driver, transformations, actions, persistence, etc.
- combining SQL, Streaming, Machine Learning, and Graph for Unified Pipelines
- review/analysis of case studies for production deployments of Spark in industry
- further resources for learning about Spark, dev community, prep for certification exam, etc.
Prerequisites:
- some programming experience in Python, Java, or Scala
- some familiarity with Big Data use cases and issues
- laptop with wifi + browser